{
 "metadata": {
  "name": "",
  "signature": "sha256:fe1302da45d854dff9c9c7047d5a2581957ab517a1bc5cedbe70cc2f8d954cd6"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Applying Naive Bayes classification to spam filtering"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's pretend we have an email with three words: \"Send money now.\" We want to classify that email as **ham or spam.**\n",
      "\n",
      "We'll use Naive Bayes classification:\n",
      "\n",
      "$$P(spam | \\text{send money now}) = \\frac {P(\\text{send money now} | spam) \\times P(spam)} {P(\\text{send money now})}$$\n",
      "\n",
      "By assuming that the features (the words) are **conditionally independent**, we can simplify the likelihood function:\n",
      "\n",
      "$$P(spam | \\text{send money now}) \\approx \\frac {P(\\text{send} | spam) \\times P(\\text{money} | spam) \\times P(\\text{now} | spam) \\times P(spam)} {P(\\text{send money now})}$$\n",
      "\n",
      "We could calculate all of the values in the numerator by examining a corpus of **spam email**:\n",
      "\n",
      "$$P(spam | \\text{send money now}) \\approx \\frac {0.2 \\times 0.1 \\times 0.1 \\times 0.9} {P(\\text{send money now})} = \\frac {0.0018} {P(\\text{send money now})}$$\n",
      "\n",
      "We could repeat this process with a corpus of **ham email**:\n",
      "\n",
      "$$P(ham | \\text{send money now}) \\approx \\frac {0.05 \\times 0.01 \\times 0.1 \\times 0.1} {P(\\text{send money now})} = \\frac {0.000005} {P(\\text{send money now})}$$\n",
      "\n",
      "All we care about is whether spam or ham has the **higher probability**, and so we predict that the email is spam."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Key takeaways\n",
      "\n",
      "- The **\"naive\" assumption** of Naive Bayes (that the features are conditionally independent) is critical to making these calculations simple.\n",
      "- The **normalization constant** (the denominator) can be ignored since it's the same for all classes.\n",
      "- The **prior probability** is basically irrelevant once you have a lot of features.\n",
      "- The Naive Bayes classifier can handle a lot of **irrelevant features**."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Comparing Naive Bayes with other models\n",
      "\n",
      "Advantages of Naive Bayes:\n",
      "\n",
      "- Model training and prediction are very fast\n",
      "- No tuning is required\n",
      "- Features don't need scaling\n",
      "- Handles irrelevant features well\n",
      "- Performs better than logistic regression when the training set is very small\n",
      "\n",
      "Disadvantages of Naive Bayes:\n",
      "\n",
      "- Interpretability is limited\n",
      "- Predicted probabilities are not well-calibrated\n",
      "- Has a higher \"asymptotic error\" than logistic regression\n",
      "- Can't automatically learn feature interactions"
     ]
    }
   ],
   "metadata": {}
  }
 ]
}